Fizz Buzz with Tensor Flow.

This notebook to explain the code from Fizz Buzz in Tensor Flow blog post written by Joel Grus
You should read his post first it is super funny!

His code try to play the Fizz Buzz game by using machine learning.

This notebook is for real beginners who whant to understand the basis of TensorFlow by reading code.
Feedback welcome @dh7net

Let's start!

The code contain several part:

Create the training set
- Encode the input (a number)
- Encode the result (fizz or buzz, none or both?)
- create the training set
Build a model
Train the model
- Create a cost function
- Iterate
Make prediction



In [1]:

    
import numpy as np
import tensorflow as tf

Create the trainning set

Encode the input (a number)

This example convert the number to a binary representation



In [2]:

    
NUM_DIGITS = 10

def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

#Let's check if it works
for i in range(10):
    print i, binary_encode(i, NUM_DIGITS)









    



0 [0 0 0 0 0 0 0 0 0 0]
1 [1 0 0 0 0 0 0 0 0 0]
2 [0 1 0 0 0 0 0 0 0 0]
3 [1 1 0 0 0 0 0 0 0 0]
4 [0 0 1 0 0 0 0 0 0 0]
5 [1 0 1 0 0 0 0 0 0 0]
6 [0 1 1 0 0 0 0 0 0 0]
7 [1 1 1 0 0 0 0 0 0 0]
8 [0 0 0 1 0 0 0 0 0 0]
9 [1 0 0 1 0 0 0 0 0 0]

Encode the result (fizz or buzz, none or both?)

The fizz_buzz function calculate what the output should be, an encoded it to a 4 dimention vector.
The fizz_buzz function take a number and a prediction, and output a string



In [3]:

    
def fizz_buzz_encode(i):
    if   i % 15 == 0: return np.array([0, 0, 0, 1])
    elif i % 5  == 0: return np.array([0, 0, 1, 0])
    elif i % 3  == 0: return np.array([0, 1, 0, 0])
    else:             return np.array([1, 0, 0, 0])
    
def fizz_buzz(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]
    
# let'see how the encoding works
for i in range(1, 16):
    print i, fizz_buzz_encode(i)









    



1 [1 0 0 0]
2 [1 0 0 0]
3 [0 1 0 0]
4 [1 0 0 0]
5 [0 0 1 0]
6 [0 1 0 0]
7 [1 0 0 0]
8 [1 0 0 0]
9 [0 1 0 0]
10 [0 0 1 0]
11 [1 0 0 0]
12 [0 1 0 0]
13 [1 0 0 0]
14 [1 0 0 0]
15 [0 0 0 1]



In [4]:

    
# and the decoding
for i in range(1, 16):
    fizz_or_buzz_number = np.argmax(fizz_buzz_encode(i))
    print i, fizz_or_buzz_number, fizz_buzz(i, fizz_or_buzz_number)









    



1 0 1
2 0 2
3 1 fizz
4 0 4
5 2 buzz
6 1 fizz
7 0 7
8 0 8
9 1 fizz
10 2 buzz
11 0 11
12 1 fizz
13 0 13
14 0 14
15 3 fizzbuzz

Create the training set



In [5]:

    
training_size = 2 ** NUM_DIGITS
print "Size of the set:", training_size
trX = np.array([binary_encode(i, NUM_DIGITS) for i in range(101, training_size)])
trY = np.array([fizz_buzz_encode(i)          for i in range(101, training_size)])

print "First 15 values:"
for i in range(101, 116):
    print i, trX[i], trY[i]









    



Size of the set: 1024
First 15 values:
101 [0 1 0 1 0 0 1 1 0 0] [1 0 0 0]
102 [1 1 0 1 0 0 1 1 0 0] [1 0 0 0]
103 [0 0 1 1 0 0 1 1 0 0] [0 1 0 0]
104 [1 0 1 1 0 0 1 1 0 0] [0 0 1 0]
105 [0 1 1 1 0 0 1 1 0 0] [1 0 0 0]
106 [1 1 1 1 0 0 1 1 0 0] [0 1 0 0]
107 [0 0 0 0 1 0 1 1 0 0] [1 0 0 0]
108 [1 0 0 0 1 0 1 1 0 0] [1 0 0 0]
109 [0 1 0 0 1 0 1 1 0 0] [0 0 0 1]
110 [1 1 0 0 1 0 1 1 0 0] [1 0 0 0]
111 [0 0 1 0 1 0 1 1 0 0] [1 0 0 0]
112 [1 0 1 0 1 0 1 1 0 0] [0 1 0 0]
113 [0 1 1 0 1 0 1 1 0 0] [1 0 0 0]
114 [1 1 1 0 1 0 1 1 0 0] [0 0 1 0]
115 [0 0 0 1 1 0 1 1 0 0] [0 1 0 0]

Creation of the model

The model is made of:

one hidden layer that contains 100 neurons
one output layer

The input is fully connected to the hidden layer and a relu function is applyed
The relu function is a rectifier that just output zero if the input is negative.

First we'll define an helper function to initialise parameters with randoms values



In [6]:

    
def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

X is the input
Y is the output
w_h are the parameters between the input and the hidden layer
w_o are the parameters between the hidden layer and the output



In [7]:

    
NUM_HIDDEN = 100 #Number of neuron in the hidden layer

X = tf.placeholder("float", [None, NUM_DIGITS])
Y = tf.placeholder("float", [None, 4])

w_h = init_weights([NUM_DIGITS, NUM_HIDDEN])
w_o = init_weights([NUM_HIDDEN, 4])

To create the model we apply the w_h parameters to the input,
and then we aply the relu function to calculate the value of the hidden layer.

The w_o coeefient are used to calculate the output layer. No rectification is applyed
py_x is the predicted value for a given input represented as a vector (dimention 4)



In [8]:

    
def model(X, w_h, w_o):
    h = tf.nn.relu(tf.matmul(X, w_h))
    return tf.matmul(h, w_o)

py_x = model(X, w_h, w_o)

Training

Create the cost function

The cost function measure how bad the model is.
It is the distance between the prediction (py_x) and the reality (Y).



In [9]:

    
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))

softmax_cross_entropy_with_logits(py_x, Y) measure the distance between py_x and Y. SoftMax is the classical way to measure the distance between a predicted result and the actual result in a cost function.
reduce_mean calculate the mean of a tensor. In this case the mean of the distance for the whole training set

Train the model

Training a model in TensorFlow is extremly simple, you just define a trainer operator!



In [10]:

    
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)

This operator will minimize the cost using the Gradient Descent witch is the most common optimizer to find parameters than will minimise the cost.

We'll also define a prediction operator that will be able to output a prediction.

0 means no fizz no buzz
1 means fizz
2 means buzz
3 means fizzbuzz



In [11]:

    
predict_op = tf.argmax(py_x, 1)

Iterate until the model is good enough

One epoch consists of one full training cycle on the training set. Once every sample in the set is seen, you start again - marking the beginning of the 2nd epoch. source

The training set is randomly permuted between each epoch.

The learning is not done on the full set at once.
Instead the learning set is divided in small batch and the learning is done for each of them.



In [12]:

    
BATCH_SIZE = 128

Here an example of index used for one epoch:



In [13]:

    
#random permutation of the index will be used during the training for each epoch
permutation_index = np.random.permutation(range(len(trX)))
for start in range(0, len(trX), BATCH_SIZE):
        end = start + BATCH_SIZE
        print "Batch starting at", start
        print permutation_index[start:end]









    



Batch starting at 0
[417 122 664 322 289 778 679 654 550 363 489 172  24 795 105 333 700 198
 404 339 471 402 240 859 907 666 495 661 153 296  23  77 423  61 603   9
 895 254 434 642 807 594 624 248 596 611  35 315 532 866 862 231 705 108
 204 908 499 552 408 657 651 405 525 734 130 228  53 569 593 391 653 464
 803 415 632  62 837 458 775  85 148 452 620 367 628 892 576 675 825  73
 758 441 200 512 529 393 640 138 112  47 717 890 432 854 896 119 521 430
 874 416 708 650 167 707  94 149 390 765 462 541 201 210 217 223 488  44
 286  58]
Batch starting at 128
[718  36 401 291 680 782 145 886 508 898 563 481 351 400 539 757 513 809
 540  92 213 706 451 163 534 601 371 186  68 909  93 362 728 482 698 337
 438 900  45 828 143 395 602 487 365 236  66 466 287 715 811 799 299 480
 591 251 668 135 867 460 870 492 134 192 566 743 822 597 577 440 600 791
 738 326  67 733 607 334 370 789 629 102 776   2 677 701 683  83 899 814
 414 695 835 188 104 522 604 469 574 730 283 781 769 794 465 109 903  65
 411 801 338 290 741 118 638 273 181 302 581 721 536 468 303 344 332 484
 245 265]
Batch starting at 256
[ 97 505 307  90 472 843 136 726 146 547 241 709 735 687 323 369 524 846
 678  88 151 711 125 229 314 788  59 819 357 572 724 792 450 313 324 686
 437 796 614 630  54  79 821 779 384 635 221 787 647 378 644 166 479 179
 546 353 444 266  72 320 203 209 714 684 838 412 298 463 761 260 573 582
 257 249 641 562  11 375 270  84 847 117 304 774 195 531 199 387 158 818
 114 920 162 381 356 606 259 335 912 377 627 282 191  76  60 301 518 128
 316  37 740 374 622 349 455 592  21 100 857 623 855 749 493  46 475 202
 906 501]
Batch starting at 384
[823 528 634 771  16  71 720 295 588 831 461 427 261 780 808 889 609 506
 618 154 567 615 570 824 691 820  29 159 293 551 311 643  99 507 403 883
 503 856 425 802 568   6 913 610 810 246 915 269 113 851 688  40 225 659
 860 277 214 748 319 490 584 234 565 230 674 509  64 914 876 272   5 511
 354 858 504 330 187 599 278 267 665 439 616 553 813 878 137  18 587 312
 621 554 523   7 537 772 388 797 285 682  91 790 555 853 916 182 237 545
 699  13 459 215  15 474  98   1 731 871 258 454 389 861 347 494 219 756
 350  81]
Batch starting at 512
[737 238 232 255 864 905 276 716 396 739  20 613 893  51 713 111 120 842
 428  86 655 839 904 564 280 306 736 147 331  32 308 189 150 190 227 300
 784 921 798 346 770 325 514 161 262 127  38 881  63 631 141 252 762 844
 559 359 342 443 732 722 806  39 841 271 502 168 873 755 208 800 264 413
 885 394 827 729 542 918 773 619 543 380 561 115 702 183 829 321 583  56
 279 180 341 431 343  55 207 764 133 783 205 719 498 284 470 826 863 447
 693 419 845 317   4 392 305 767 176 101 517 446 226 406 672 852  57 919
 608 598]
Batch starting at 640
[123 491 329 667 681 422  12 865 379 426 435  78 636 696  22 578 637 256
 177 744 922 164   8  87  89 694 420 250 473 515 193 222 697 887 612 328
 456 585 558 382 777 785 424 723 268 727 891 442 467 617 595 410 690  25
 297  14  17 361 368 399 373 385 646 364 185 793 834 310 519 160 496  42
 850 383 662 575 107 888  52 712 318 648 548 884  30 786  48 156 516 768
 660 242 911 397 358 376 340 235 658 483 309 549 882 449 703 103  74 211
 747 586 671  69 902 429 910 804 348 742 901 233 704 247  50 544 605 142
  33 327]
Batch starting at 768
[212 830 216 751   0 144 633 366 833 418 917 849 140 486 880  80 692  75
 639 131 178 124 579 590 877 110 759 589  95 875  10 485 372 526 894 763
 560 184 218 836  27 448 663 132 710 152 868 497 126   3 165 294 157 139
 848 872  49 530 673 121 355 288 174 407 535 170 816  34 453 155 753  41
 457 626  43 897 253 220 433 538  82 752 436 274 805 760 175 685 676 500
 476 244 409 527 817 840 652  19 556 869 533 670 520 239 173 129 243 725
 386 689 197  28  26  96 812 510 292 360 224 171 106 746 571 336 766 169
 478 649]
Batch starting at 896
[345 116 398 281 263 206 477 196 352 656 754 625 745 669 879 445 275 421
 750 832  31 580 815 194 557 645  70]



In [14]:

    
# Launch the graph in a session
sess = tf.Session()
tf.initialize_all_variables().run(session=sess)

for epoch in range(5000):
    # Shuffle the data before each training iteration.
    p = np.random.permutation(range(len(trX)))
    trX, trY = trX[p], trY[p]

    # Train in batches of 128 inputs.
    for start in range(0, len(trX), BATCH_SIZE):
        end = start + BATCH_SIZE
        sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})

    # And print the current accuracy on the training data.
    if (epoch%100==0):  # each 100 epoch, to not overflow the jupyter log
        # np.mean(A==B) return a number between 0 and 1. (true_count/total_count)
        print(epoch, np.mean(np.argmax(trY, axis=1) ==
                         sess.run(predict_op, feed_dict={X: trX, Y: trY})))









    



(0, 0.5297941495124594)
(100, 0.53412784398699886)
(200, 0.53412784398699886)
(300, 0.53412784398699886)
(400, 0.53412784398699886)
(500, 0.53412784398699886)
(600, 0.53412784398699886)
(700, 0.53846153846153844)
(800, 0.53954496208017333)
(900, 0.54929577464788737)
(1000, 0.55254604550379194)
(1100, 0.55579631635969662)
(1200, 0.56338028169014087)
(1300, 0.59046587215601298)
(1400, 0.61971830985915488)
(1500, 0.64138678223185264)
(1600, 0.6619718309859155)
(1700, 0.6912242686890574)
(1800, 0.72156013001083419)
(1900, 0.71722643553629473)
(2000, 0.78439869989165767)
(2100, 0.83748645720476711)
(2200, 0.84507042253521125)
(2300, 0.83423618634886243)
(2400, 0.90357529794149516)
(2500, 0.90465872156013005)
(2600, 0.90465872156013005)
(2700, 0.9263271939328277)
(2800, 0.93824485373781152)
(2900, 0.93282773564463706)
(3000, 0.94907908992416035)
(3100, 0.94366197183098588)
(3200, 0.93824485373781152)
(3300, 0.96424702058504874)
(3400, 0.96099674972914406)
(3500, 0.9707475622968581)
(3600, 0.94474539544962077)
(3700, 0.97616468039003246)
(3800, 0.95882990249187428)
(3900, 0.97724810400866735)
(4000, 0.97508125677139756)
(4100, 0.98374864572047671)
(4200, 0.98158179848320692)
(4300, 0.99133261105092096)
(4400, 0.97616468039003246)
(4500, 0.971830985915493)
(4600, 0.9848320693391116)
(4700, 0.99241603466955575)
(4800, 0.99349945828819064)
(4900, 0.99674972914409532)



In [15]:

    
# And now for some fizz buzz
numbers = np.arange(1, 101)
teX = np.transpose(binary_encode(numbers, NUM_DIGITS))
teY = sess.run(predict_op, feed_dict={X: teX})

output = np.vectorize(fizz_buzz)(numbers, teY)
print output









    



['1' '2' 'fizz' '4' '5' '6' '7' '8' 'fizz' '10' '11' 'fizz' '13' '14'
 'fizzbuzz' '16' '17' 'fizz' '19' 'buzz' 'fizz' '22' '23' 'fizz' 'fizz'
 '26' 'fizz' '28' '29' 'fizzbuzz' '31' '32' 'fizz' '34' '35' '36' '37' '38'
 '39' '40' '41' '42' '43' '44' 'fizzbuzz' '46' '47' 'fizz' '49' '50' 'fizz'
 '52' 'fizz' '54' 'fizz' '56' 'fizz' '58' '59' 'fizzbuzz' '61' '62' 'fizz'
 '64' 'buzz' 'fizz' '67' '68' 'fizz' 'buzz' '71' '72' '73' '74' 'fizzbuzz'
 '76' '77' 'fizz' '79' '80' 'fizz' '82' 'fizz' 'fizz' 'buzz' '86' 'fizz'
 '88' '89' 'fizzbuzz' '91' '92' 'fizz' '94' 'buzz' '96' '97' 'buzz' '99'
 'buzz']



In [16]:

    
sess.close() # don't forget to close the session if you don't use it anymore. Or use the *with* statement.



In [17]:

    
# Lets check the quality
Y = np.array([fizz_buzz_encode(i) for i in range(1,101)])
print "accuracy", np.mean(np.argmax(Y, axis=1) == teY)

for i in range(1,100):
    actual = fizz_buzz(i, np.argmax(fizz_buzz_encode(i)))
    predicted = output[i-1]
    ok = True
    if actual <> predicted: ok = False
    print i, "{:>8}".format(actual), "{:>8}".format(predicted), ok









    



accuracy 0.81
1        1        1 True
2        2        2 True
3     fizz     fizz True
4        4        4 True
5     buzz        5 False
6     fizz        6 False
7        7        7 True
8        8        8 True
9     fizz     fizz True
10     buzz       10 False
11       11       11 True
12     fizz     fizz True
13       13       13 True
14       14       14 True
15 fizzbuzz fizzbuzz True
16       16       16 True
17       17       17 True
18     fizz     fizz True
19       19       19 True
20     buzz     buzz True
21     fizz     fizz True
22       22       22 True
23       23       23 True
24     fizz     fizz True
25     buzz     fizz False
26       26       26 True
27     fizz     fizz True
28       28       28 True
29       29       29 True
30 fizzbuzz fizzbuzz True
31       31       31 True
32       32       32 True
33     fizz     fizz True
34       34       34 True
35     buzz       35 False
36     fizz       36 False
37       37       37 True
38       38       38 True
39     fizz       39 False
40     buzz       40 False
41       41       41 True
42     fizz       42 False
43       43       43 True
44       44       44 True
45 fizzbuzz fizzbuzz True
46       46       46 True
47       47       47 True
48     fizz     fizz True
49       49       49 True
50     buzz       50 False
51     fizz     fizz True
52       52       52 True
53       53     fizz False
54     fizz       54 False
55     buzz     fizz False
56       56       56 True
57     fizz     fizz True
58       58       58 True
59       59       59 True
60 fizzbuzz fizzbuzz True
61       61       61 True
62       62       62 True
63     fizz     fizz True
64       64       64 True
65     buzz     buzz True
66     fizz     fizz True
67       67       67 True
68       68       68 True
69     fizz     fizz True
70     buzz     buzz True
71       71       71 True
72     fizz       72 False
73       73       73 True
74       74       74 True
75 fizzbuzz fizzbuzz True
76       76       76 True
77       77       77 True
78     fizz     fizz True
79       79       79 True
80     buzz       80 False
81     fizz     fizz True
82       82       82 True
83       83     fizz False
84     fizz     fizz True
85     buzz     buzz True
86       86       86 True
87     fizz     fizz True
88       88       88 True
89       89       89 True
90 fizzbuzz fizzbuzz True
91       91       91 True
92       92       92 True
93     fizz     fizz True
94       94       94 True
95     buzz     buzz True
96     fizz       96 False
97       97       97 True
98       98     buzz False
99     fizz       99 False

Conclusion

Using Tensor flow to solve fizz buzz is overkill and not very accurate.
But is is fun and a nice way to learn Tensor Flow!

Feedback welcome @dh7net